Airbnb && Zillow 🖋📝 - EDA📚 & Data Analysis🎯
Sreeja Adavalli
Problem Statement:
You are consulting for a real estate company that has a niche in purchasing properties to rent out short-term as part of their business model specifically within New York City. The real estate company has already concluded that two bedroom properties are the most profitable; however, they do not know which zip codes are the best to invest in.
The real estate company has engaged your firm to build out a data product and provide your conclusions to help them understand which zip codes would generate the most profit on short term rentals within New York City.
You will be looking at publicly available data from Zillow and AirBnB: Cost data: Zillow provides us an estimate of value for two-bedroom properties Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property.
Target: Finding the best zipcodes and area to buy two bedroom homes and rent it out to airbnb with assumption of 75% occupancy rate and no time value of money discount rate
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import pylab
import plotly
import plotly_express as px
import pandas_profiling
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 1000)
import scipy.stats as stats
from geopy.geocoders import Nominatim #might require 'geopy' installation in python environment
geolocator = Nominatim(user_agent="geoapiExercises")
import re
import matplotlib.pyplot as plt
import seaborn as sns
air_bnb_1 = pd.read_csv('listings file 1 of 4.csv')
air_bnb_2 = pd.read_csv('listings file 2 of 4.csv', names = air_bnb_1.columns)
air_bnb_3 = pd.read_csv('listings file 3 of 4.csv', names = air_bnb_1.columns)
air_bnb_4 = pd.read_csv('listings file 4 of 4.csv',names = air_bnb_1.columns)
zillow = pd.read_csv('Zip_Zhvi_2bedroom.csv')
#let's print shape of data
print(f"shape of first file : {air_bnb_1.shape}")
print(f"shape of second file : {air_bnb_2.shape}")
print(f"shape of third file : {air_bnb_3.shape}")
print(f"shape of fourth file : {air_bnb_4.shape}")
shape of first file : (9999, 95) shape of second file : (10000, 95) shape of third file : (10000, 95) shape of fourth file : (10754, 95)
Only the first csv file contains column names and all other files have equal number of columns so the files would be concatenated into one files using the first csv file
Assigning column headers to air_bnb_2, air_bnb_3, air_bnb_4 from air_bnb_1 dataset
concat_keys = air_bnb_1.columns
def concat_data(*files):
"""Function to concatenate diffrent csv files into one file"""
##loop through files and append to df_list using list comprehension
df_list = [file for file in files]
##concat files using pandas
result = pd.concat(df_list, ignore_index=True, keys = concat_keys )
#return result
return result
##apply concat_data fucntion and save in new dataframe
air_bnb = concat_data(air_bnb_1, air_bnb_2, air_bnb_3, air_bnb_4)
air_bnb.columns
Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
'space', 'description', 'experiences_offered', 'neighborhood_overview',
'notes', 'transit', 'access', 'interaction', 'house_rules',
'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url',
'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
'host_about', 'host_response_time', 'host_response_rate',
'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
'host_picture_url', 'host_neighbourhood', 'host_listings_count',
'host_total_listings_count', 'host_verifications',
'host_has_profile_pic', 'host_identity_verified', 'street',
'neighbourhood', 'neighbourhood_cleansed',
'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market',
'smart_location', 'country_code', 'country', 'latitude', 'longitude',
'is_location_exact', 'property_type', 'room_type', 'accommodates',
'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'square_feet',
'price', 'weekly_price', 'monthly_price', 'security_deposit',
'cleaning_fee', 'guests_included', 'extra_people', 'minimum_nights',
'maximum_nights', 'calendar_updated', 'has_availability',
'availability_30', 'availability_60', 'availability_90',
'availability_365', 'calendar_last_scraped', 'number_of_reviews',
'first_review', 'last_review', 'review_scores_rating',
'review_scores_accuracy', 'review_scores_cleanliness',
'review_scores_checkin', 'review_scores_communication',
'review_scores_location', 'review_scores_value', 'requires_license',
'license', 'jurisdiction_names', 'instant_bookable',
'cancellation_policy', 'require_guest_profile_picture',
'require_guest_phone_verification', 'calculated_host_listings_count',
'reviews_per_month'],
dtype='object')
air_bnb.shape
(40753, 95)
air_bnb.head(2)
| id | listing_url | scrape_id | last_scraped | name | summary | space | description | experiences_offered | neighborhood_overview | notes | transit | access | interaction | house_rules | thumbnail_url | medium_url | picture_url | xl_picture_url | host_id | host_url | host_name | host_since | host_location | host_about | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_thumbnail_url | host_picture_url | host_neighbourhood | host_listings_count | host_total_listings_count | host_verifications | host_has_profile_pic | host_identity_verified | street | neighbourhood | neighbourhood_cleansed | neighbourhood_group_cleansed | city | state | zipcode | market | smart_location | country_code | country | latitude | longitude | is_location_exact | property_type | room_type | accommodates | bathrooms | bedrooms | beds | bed_type | amenities | square_feet | price | weekly_price | monthly_price | security_deposit | cleaning_fee | guests_included | extra_people | minimum_nights | maximum_nights | calendar_updated | has_availability | availability_30 | availability_60 | availability_90 | availability_365 | calendar_last_scraped | number_of_reviews | first_review | last_review | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | requires_license | license | jurisdiction_names | instant_bookable | cancellation_policy | require_guest_profile_picture | require_guest_phone_verification | calculated_host_listings_count | reviews_per_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7949480 | https://www.airbnb.com/rooms/7949480 | 2.020000e+13 | 5/3/17 | City Island Sanctuary relaxing BR & Bath w Par... | Come relax on City Island in our quiet guest r... | On parle français et anglais, (lire Français... | Come relax on City Island in our quiet guest r... | none | City Island is a unique sanctuary in New York ... | We are walkable to Pelham Park which is the la... | On the island, you can walk to everything. A ... | Feel free to enjoy our large backyard in seaso... | We speak fluent French and English. We will b... | No extra guests or visitors who are not in the... | https://a0.muscache.com/im/pictures/001d1cb0-6... | https://a0.muscache.com/im/pictures/001d1cb0-6... | https://a0.muscache.com/im/pictures/001d1cb0-6... | https://a0.muscache.com/im/pictures/001d1cb0-6... | 119445 | https://www.airbnb.com/users/show/119445 | Linda & Didier | 5/6/10 | New York, New York, United States | I am a designer, former instructor at a design... | within an hour | 100% | NaN | t | https://a0.muscache.com/im/pictures/2de30fe4-b... | https://a0.muscache.com/im/pictures/2de30fe4-b... | City Island | 1.0 | 1.0 | ['email', 'phone', 'reviews', 'jumio'] | t | t | City Island, Bronx, NY 10464, United States | City Island | City Island | Bronx | Bronx | NY | 10464 | New York | Bronx, NY | US | United States | 40.852054 | -73.788680 | t | House | Private room | 2 | 1.0 | 1.0 | 1.0 | Real Bed | {"Cable TV","Wireless Internet","Air condition... | NaN | $99.00 | NaN | NaN | $100.00 | NaN | 1 | $20.00 | 1 | 7 | yesterday | NaN | 24 | 54 | 80 | 170 | 5/3/17 | 25 | 1/18/16 | 4/23/17 | 100.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | f | NaN | NaN | f | moderate | t | t | 1 | 1.59 |
| 1 | 16042478 | https://www.airbnb.com/rooms/16042478 | 2.020000e+13 | 5/4/17 | WATERFRONT STUDIO APARTMENT | My place is close to Sea Shore. You’ll love ... | (URL HIDDEN) | My place is close to Sea Shore. You’ll love ... | none | (URL HIDDEN) | Fine sea food restaurant, bars and night clubs... | NaN | public transport from Grand Central, NYC or fr... | NaN | NaN | https://a0.muscache.com/im/pictures/bb5bc3c4-3... | https://a0.muscache.com/im/pictures/bb5bc3c4-3... | https://a0.muscache.com/im/pictures/bb5bc3c4-3... | https://a0.muscache.com/im/pictures/bb5bc3c4-3... | 9117975 | https://www.airbnb.com/users/show/9117975 | Collins | 9/29/13 | New York, New York, United States | I am married with 3 children and 2 grandkid. ... | a few days or more | 0% | NaN | f | https://a0.muscache.com/im/users/9117975/profi... | https://a0.muscache.com/im/users/9117975/profi... | City Island | 1.0 | 1.0 | ['phone', 'facebook'] | t | f | City Island, Bronx, NY 10464, United States | City Island | City Island | Bronx | Bronx | NY | 10464 | New York | Bronx, NY | US | United States | 40.853491 | -73.788607 | t | Apartment | Private room | 4 | 1.0 | 1.0 | 1.0 | Real Bed | {TV,Internet,"Wireless Internet","Air conditio... | NaN | $200.00 | NaN | NaN | NaN | NaN | 1 | $0.00 | 7 | 29 | 6 months ago | NaN | 30 | 60 | 90 | 180 | 5/4/17 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | f | NaN | NaN | t | flexible | f | f | 1 | NaN |
columns_selected = ["state", "host_id", "host_since", "host_is_superhost", "host_total_listings_count", "neighbourhood_cleansed", "neighbourhood_group_cleansed", "room_type", "bedrooms", "bathrooms",
"square_feet", "minimum_nights", "maximum_nights","number_of_reviews", "last_review", "review_scores_rating", "security_deposit", "cleaning_fee", "city", "latitude", "longitude",
"zipcode", "price", "weekly_price", "monthly_price", "availability_30", "availability_60", "availability_90", "availability_365", "cancellation_policy"]
air_bnb_col = air_bnb[columns_selected]
air_bnb_col.columns
Index(['state', 'host_id', 'host_since', 'host_is_superhost',
'host_total_listings_count', 'neighbourhood_cleansed',
'neighbourhood_group_cleansed', 'room_type', 'bedrooms', 'bathrooms',
'square_feet', 'minimum_nights', 'maximum_nights', 'number_of_reviews',
'last_review', 'review_scores_rating', 'security_deposit',
'cleaning_fee', 'city', 'latitude', 'longitude', 'zipcode', 'price',
'weekly_price', 'monthly_price', 'availability_30', 'availability_60',
'availability_90', 'availability_365', 'cancellation_policy'],
dtype='object')
#Generating profiling report
pandas_profiling.ProfileReport(air_bnb_col, minimal=True)
Data quality checklist:
Duplication : The percentage of duplicated values is just under 5 percent which is a minimal and negligible number which is to be expected for any dataset. Hence the dataset passed the duplication test
Consolidation: Datasets are scattered around and require a lot of effort to acquire and merge together. Therefore, the dataset does not pass the consolidation checklist.
Missing values: There are quite a number of missing entries in dataset and some of the columns would have to be dropped and few have to be fixed by imputing relevant values.
Relevance: A close inspection of the columns in the dataset shows that the dataset contains data which is relevant and can be used to derive insight to drive business decisions
print(f"Total number of duplicated entries : {air_bnb_col.duplicated().sum()}")
Total number of duplicated entries : 0
air_bnb_col['state'].unique()
array(['NY', 'MP', 'New York', 'ny', 'NJ', 'VT'], dtype=object)
air_bnb_col['state'].replace("ny", "NY", inplace=True)
air_bnb_col['state'].replace("New York", "NY", inplace=True)
air_bnb_col['state'].unique()
array(['NY', 'MP', 'NJ', 'VT'], dtype=object)
air_bnb_ny = air_bnb_col[(air_bnb_col['state']== 'NY') & (air_bnb_col['bedrooms'] == 2)]
air_bnb_ny['state'].unique()
array(['NY'], dtype=object)
air_bnb_ny.shape
(4893, 30)
air_bnb_ny['neighbourhood_group_cleansed'].unique()
array(['Bronx', 'Queens', 'Staten Island', 'Brooklyn', 'Manhattan'],
dtype=object)
c = ['price','weekly_price','monthly_price','cleaning_fee','security_deposit']
air_bnb_ny[c][:10]
| price | weekly_price | monthly_price | cleaning_fee | security_deposit | |
|---|---|---|---|---|---|
| 12 | $130.00 | NaN | NaN | NaN | $600.00 |
| 23 | $150.00 | NaN | NaN | $75.00 | $1,500.00 |
| 33 | $200.00 | NaN | NaN | NaN | NaN |
| 36 | $250.00 | NaN | NaN | NaN | NaN |
| 38 | $79.00 | NaN | NaN | $400.00 | $2,000.00 |
| 40 | $225.00 | NaN | NaN | $95.00 | $250.00 |
| 42 | $100.00 | NaN | NaN | $70.00 | $300.00 |
| 43 | $160.00 | $1,100.00 | $3,600.00 | $75.00 | NaN |
| 61 | $100.00 | NaN | NaN | $20.00 | $100.00 |
| 65 | $252.00 | $1,927.00 | $7,307.00 | $85.00 | $100.00 |
def percent_of_nulls(x):
return(x.isnull().sum()/len(x)*100)
percent_of_nulls(air_bnb_ny)
state 0.000000 host_id 0.000000 host_since 0.388310 host_is_superhost 0.388310 host_total_listings_count 0.388310 neighbourhood_cleansed 0.000000 neighbourhood_group_cleansed 0.000000 room_type 0.000000 bedrooms 0.000000 bathrooms 0.163499 square_feet 97.690578 minimum_nights 0.000000 maximum_nights 0.000000 number_of_reviews 0.000000 last_review 19.517678 review_scores_rating 20.825669 security_deposit 42.693644 cleaning_fee 19.722052 city 0.102187 latitude 0.000000 longitude 0.000000 zipcode 1.267116 price 0.000000 weekly_price 78.929082 monthly_price 82.607807 availability_30 0.000000 availability_60 0.000000 availability_90 0.000000 availability_365 0.000000 cancellation_policy 0.000000 dtype: float64
air_bnb_ny['price'].replace({'\$':''}, regex = True, inplace=True)
air_bnb_ny['price'].replace({',':''}, regex = True, inplace=True)
air_bnb_ny['price'] = air_bnb_ny['price'].astype("float")
air_bnb_ny['cleaning_fee'].replace({'\$':''}, regex = True, inplace=True)
air_bnb_ny['cleaning_fee'].replace({',':''}, regex = True, inplace=True)
air_bnb_ny['cleaning_fee'] = air_bnb_ny['cleaning_fee'].astype("float")
air_bnb_ny['price'].describe()
count 4893.000000 mean 232.064786 std 270.359138 min 10.000000 25% 135.000000 50% 190.000000 75% 275.000000 max 9999.000000 Name: price, dtype: float64
air_bnb_ny['cleaning_fee'].describe()
count 3928.000000 mean 85.823065 std 44.743311 min 0.000000 25% 50.000000 50% 80.000000 75% 100.000000 max 600.000000 Name: cleaning_fee, dtype: float64
mode = air_bnb_ny['cleaning_fee'].mode()
fig, ax = plt.subplots(figsize=(10,8))
sns.distplot(air_bnb_ny.cleaning_fee)
<AxesSubplot:xlabel='cleaning_fee', ylabel='Density'>
air_bnb_ny['cleaning_fee'] = np.where(air_bnb_ny['cleaning_fee'].isnull(), mode, air_bnb_ny['cleaning_fee'])
fig, ax = plt.subplots(figsize=(10,8))
sns.boxplot(air_bnb_ny.cleaning_fee)
<AxesSubplot:xlabel='cleaning_fee'>
fig, ax = plt.subplots(figsize=(10,8))
sns.boxplot(air_bnb_ny.price)
<AxesSubplot:xlabel='price'>
mean_price = np.mean(air_bnb_ny['price'], axis=0)
sd_price = np.std(air_bnb_ny['price'], axis=0)
air_bnb_ny['price'][air_bnb_ny['price'] > mean_price+2*sd_price].sort_values().unique()
array([ 775., 789., 790., 795., 799., 800., 844., 850., 875.,
877., 878., 899., 900., 925., 950., 975., 985., 989.,
999., 1000., 1050., 1099., 1100., 1115., 1120., 1150., 1200.,
1500., 1600., 1650., 2000., 2500., 2695., 2750., 3150., 3750.,
4700., 9600., 9999.])
mean_cleanfee = np.mean(air_bnb_ny['cleaning_fee'], axis=0)
sd_cleanfee = np.std(air_bnb_ny['cleaning_fee'], axis=0)
air_bnb_ny['cleaning_fee'][air_bnb_ny['cleaning_fee'] > mean_cleanfee+2*sd_cleanfee].sort_values().unique()
array([170., 175., 180., 185., 186., 190., 195., 198., 199., 200., 210.,
225., 230., 250., 290., 295., 299., 300., 350., 400., 500., 600.])
air_bnb_ny_norm = air_bnb_ny[(air_bnb_ny['price'] < mean_price +2*sd_price) &
(air_bnb_ny['cleaning_fee'] < mean_cleanfee +2*sd_cleanfee)]
stats.probplot(air_bnb_ny_norm['price'], dist="norm", plot=pylab)
pylab.show()
stats.probplot(air_bnb_ny_norm['cleaning_fee'], dist="norm", plot=pylab)
pylab.show()
air_bnb_ny_norm['price'].describe()
count 4685.000000 mean 208.692850 std 109.848134 min 10.000000 25% 130.000000 50% 182.000000 75% 250.000000 max 760.000000 Name: price, dtype: float64
air_bnb_ny_norm['cleaning_fee'].describe()
count 4685.000000 mean 84.287513 std 31.997817 min 0.000000 25% 60.000000 50% 99.000000 75% 100.000000 max 169.000000 Name: cleaning_fee, dtype: float64
air_bnb_ny_norm.shape
(4685, 30)
missing_zipcode = air_bnb_ny_norm[air_bnb_ny_norm['zipcode'].isnull()]
len(missing_zipcode)
60
air_bnb_ny_subset=air_bnb_ny_norm[~air_bnb_ny_norm.isin(missing_zipcode)].dropna(how = 'all')
zips=[]
for i,j in zip(missing_zipcode.latitude,missing_zipcode.longitude):
zips.append(geolocator.reverse(str(i)+','+str(j)).raw['address']['postcode'])
index = missing_zipcode[missing_zipcode['zipcode'].isnull()].index
missing_zipcode.zipcode[index.values] = zips
air_bnb_ny_norm_new = pd.concat([air_bnb_ny_subset, missing_zipcode], ignore_index=True)
len(air_bnb_ny_norm_new[air_bnb_ny_norm_new.zipcode.isnull()])
0
len(air_bnb_ny_norm_new.zipcode.unique())
175
air_bnb_ny_norm_new['zipcode'] = air_bnb_ny_norm_new['zipcode'].str.slice(0,5)
clean_zip = []
for z in air_bnb_ny_norm_new['zipcode'].astype('str').values:
clean_zip.append((z.split('-')[0]).replace('.0',''))
air_bnb_ny_norm_new['zipcode'] = clean_zip
len(air_bnb_ny_norm_new['zipcode'].unique())
147
# explore the categories for 'room_type'
sns.countplot(air_bnb_ny_norm_new['room_type'])
<AxesSubplot:xlabel='room_type', ylabel='count'>
zillow.shape
(8946, 262)
zillow.head()
| RegionID | RegionName | City | State | Metro | CountyName | SizeRank | 1996-04 | 1996-05 | 1996-06 | 1996-07 | 1996-08 | 1996-09 | 1996-10 | 1996-11 | 1996-12 | 1997-01 | 1997-02 | 1997-03 | 1997-04 | 1997-05 | 1997-06 | 1997-07 | 1997-08 | 1997-09 | 1997-10 | 1997-11 | 1997-12 | 1998-01 | 1998-02 | 1998-03 | 1998-04 | 1998-05 | 1998-06 | 1998-07 | 1998-08 | 1998-09 | 1998-10 | 1998-11 | 1998-12 | 1999-01 | 1999-02 | 1999-03 | 1999-04 | 1999-05 | 1999-06 | 1999-07 | 1999-08 | 1999-09 | 1999-10 | 1999-11 | 1999-12 | 2000-01 | 2000-02 | 2000-03 | 2000-04 | 2000-05 | 2000-06 | 2000-07 | 2000-08 | 2000-09 | 2000-10 | 2000-11 | 2000-12 | 2001-01 | 2001-02 | 2001-03 | 2001-04 | 2001-05 | 2001-06 | 2001-07 | 2001-08 | 2001-09 | 2001-10 | 2001-11 | 2001-12 | 2002-01 | 2002-02 | 2002-03 | 2002-04 | 2002-05 | 2002-06 | 2002-07 | 2002-08 | 2002-09 | 2002-10 | 2002-11 | 2002-12 | 2003-01 | 2003-02 | 2003-03 | 2003-04 | 2003-05 | 2003-06 | 2003-07 | 2003-08 | 2003-09 | 2003-10 | 2003-11 | 2003-12 | 2004-01 | 2004-02 | 2004-03 | 2004-04 | 2004-05 | 2004-06 | 2004-07 | 2004-08 | 2004-09 | 2004-10 | 2004-11 | 2004-12 | 2005-01 | 2005-02 | 2005-03 | 2005-04 | 2005-05 | 2005-06 | 2005-07 | 2005-08 | 2005-09 | 2005-10 | 2005-11 | 2005-12 | 2006-01 | 2006-02 | 2006-03 | 2006-04 | 2006-05 | 2006-06 | 2006-07 | 2006-08 | 2006-09 | 2006-10 | 2006-11 | 2006-12 | 2007-01 | 2007-02 | 2007-03 | 2007-04 | 2007-05 | 2007-06 | 2007-07 | 2007-08 | 2007-09 | 2007-10 | 2007-11 | 2007-12 | 2008-01 | 2008-02 | 2008-03 | 2008-04 | 2008-05 | 2008-06 | 2008-07 | 2008-08 | 2008-09 | 2008-10 | 2008-11 | 2008-12 | 2009-01 | 2009-02 | 2009-03 | 2009-04 | 2009-05 | 2009-06 | 2009-07 | 2009-08 | 2009-09 | 2009-10 | 2009-11 | 2009-12 | 2010-01 | 2010-02 | 2010-03 | 2010-04 | 2010-05 | 2010-06 | 2010-07 | 2010-08 | 2010-09 | 2010-10 | 2010-11 | 2010-12 | 2011-01 | 2011-02 | 2011-03 | 2011-04 | 2011-05 | 2011-06 | 2011-07 | 2011-08 | 2011-09 | 2011-10 | 2011-11 | 2011-12 | 2012-01 | 2012-02 | 2012-03 | 2012-04 | 2012-05 | 2012-06 | 2012-07 | 2012-08 | 2012-09 | 2012-10 | 2012-11 | 2012-12 | 2013-01 | 2013-02 | 2013-03 | 2013-04 | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 | 2013-11 | 2013-12 | 2014-01 | 2014-02 | 2014-03 | 2014-04 | 2014-05 | 2014-06 | 2014-07 | 2014-08 | 2014-09 | 2014-10 | 2014-11 | 2014-12 | 2015-01 | 2015-02 | 2015-03 | 2015-04 | 2015-05 | 2015-06 | 2015-07 | 2015-08 | 2015-09 | 2015-10 | 2015-11 | 2015-12 | 2016-01 | 2016-02 | 2016-03 | 2016-04 | 2016-05 | 2016-06 | 2016-07 | 2016-08 | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 61627 | 10013 | New York | NY | New York | New York | 1744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1428000.0 | 1444100.0 | 1458700.0 | 1502700.0 | 1570100.0 | 1638000.0 | 1687000.0 | 1718300.0 | 1747600.0 | 1753300.0 | 1706600.0 | 1660500.0 | 1650500.0 | 1642200.0 | 1625600.0 | 1618600.0 | 1618900.0 | 1622200.0 | 1623500.0 | 1639500.0 | 1678900.0 | 1705500.0 | 1702200.0 | 1701700.0 | 1730800.0 | 1773400.0 | 1839600.0 | 1903600.0 | 1935000.0 | 1946100.0 | 1974100.0 | 1987800.0 | 1993400.0 | 1990200.0 | 1970600.0 | 1953100.0 | 1979000.0 | 2032800.0 | 2070000.0 | 2090200.0 | 2121100.0 | 2167300.0 | 2200000.0 | 2226600.0 | 2259300.0 | 2285700.0 | 2289000.0 | 2288500.0 | 2285800.0 | 2281300.0 | 2278800.0 | 2295100.0 | 2324500.0 | 2328500.0 | 2333000.0 | 2366700.0 | 2376200.0 | 2377400.0 | 2405100.0 | 2433000.0 | 2406100.0 | 2350800.0 | 2289500.0 | 2257800.0 | 2243500.0 | 2241000.0 | 2233600.0 | 2219900.0 | 2198000.0 | 2209300.0 | 2268700.0 | 2294100.0 | 2273600.0 | 2261700.0 | 2282800.0 | 2288000.0 | 2275700.0 | 2281300.0 | 2302800.0 | 2290900.0 | 2267800.0 | 2253600.0 | 2245300.0 | 2264200.0 | 2301000.0 | 2328600.0 | 2337000.0 | 2341500.0 | 2335200.0 | 2316700.0 | 2299900.0 | 2318700.0 | 2396400.0 | 2479200.0 | 2490300.0 | 2466000.0 | 2492400.0 | 2542800.0 | 2585500.0 | 2618100.0 | 2651700.0 | 2670800.0 | 2670000.0 | 2665400.0 | 2677400.0 | 2687000.0 | 2686200.0 | 2687700.0 | 2694200.0 | 2708600.0 | 2722800.0 | 2727900.0 | 2751300.0 | 2794600 | 2845700 | 2899700 | 2928600 | 2914100 | 2915200 | 2945600 | 2973800 | 2988900 | 3018400 | 3063600 | 3075900 | 3063800 | 3077400 | 3108800 | 3123300 | 3106400 | 3067600 | 3047400 | 3054500 | 3071700 | 3078300 | 3073100 | 3089400 | 3121700 | 3134200 | 3152400 | 3173600 | 3195000 | 3234600 | 3302400.0 | 3335800.0 | 3324200.0 | 3285100.0 | 3274100 | 3298600 | 3341100 | 3347100 | 3305500 | 3261100 | 3244000.0 | 3231400 | 3183300 | 3160200 | 3193500 | 3262200 | 3316500 |
| 1 | 61628 | 10014 | New York | NY | New York | New York | 379 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1331700.0 | 1332100.0 | 1329200.0 | 1331500.0 | 1334300.0 | 1314100.0 | 1292300.0 | 1264600.0 | 1239900.0 | 1241400.0 | 1273900.0 | 1308200.0 | 1349300.0 | 1390900.0 | 1439000.0 | 1501900.0 | 1569000.0 | 1592600.0 | 1578800.0 | 1583800.0 | 1612400.0 | 1623900.0 | 1625500.0 | 1632400.0 | 1632000.0 | 1617600.0 | 1601500.0 | 1589100.0 | 1584000.0 | 1579500.0 | 1553800.0 | 1528600.0 | 1514500.0 | 1505800.0 | 1499800.0 | 1517000.0 | 1559000.0 | 1610300.0 | 1659600.0 | 1732200.0 | 1788500.0 | 1783100.0 | 1753200.0 | 1736200.0 | 1729400.0 | 1741500.0 | 1774500.0 | 1808400.0 | 1827500.0 | 1813800.0 | 1785500.0 | 1774800.0 | 1759800.0 | 1718200.0 | 1677300.0 | 1636900.0 | 1588800.0 | 1559300.0 | 1551900.0 | 1532400.0 | 1507000.0 | 1496600.0 | 1502400.0 | 1485700.0 | 1451800.0 | 1454400.0 | 1494500.0 | 1542200.0 | 1587300.0 | 1616000.0 | 1612300.0 | 1605800.0 | 1592500.0 | 1584200.0 | 1587600.0 | 1594300.0 | 1611700.0 | 1631800.0 | 1648800.0 | 1670500.0 | 1674300.0 | 1657100.0 | 1659200.0 | 1661800.0 | 1641600.0 | 1637800.0 | 1646600.0 | 1652100.0 | 1674800.0 | 1714300.0 | 1736800.0 | 1736800.0 | 1741200.0 | 1757700.0 | 1772600.0 | 1779300.0 | 1795000.0 | 1790300.0 | 1761100.0 | 1744400.0 | 1755400.0 | 1761200.0 | 1767000.0 | 1804300.0 | 1889100.0 | 1963800.0 | 2003800.0 | 2041600 | 2098500 | 2143400 | 2175600 | 2203100 | 2219000 | 2222700 | 2225400 | 2215900 | 2161700 | 2079100 | 2018200 | 1991700 | 1986200 | 2027900 | 2081200 | 2093200 | 2082800 | 2107300 | 2142400 | 2167100 | 2189400 | 2220100 | 2250300 | 2275700 | 2301600 | 2314700 | 2322000 | 2334500 | 2360300 | 2384700.0 | 2388200.0 | 2358300.0 | 2345800.0 | 2381700 | 2439700 | 2483000 | 2480800 | 2443200 | 2430100 | 2452900.0 | 2451200 | 2441900 | 2460900 | 2494900 | 2498400 | 2491600 |
| 2 | 61625 | 10011 | New York | NY | New York | New York | 15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1384000.0 | 1325500.0 | 1297400.0 | 1313500.0 | 1332400.0 | 1314500.0 | 1296700.0 | 1295100.0 | 1280000.0 | 1255300.0 | 1250800.0 | 1261200.0 | 1291100.0 | 1333900.0 | 1358500.0 | 1382600.0 | 1438800.0 | 1473100.0 | 1444500.0 | 1407100.0 | 1400300.0 | 1401600.0 | 1400500.0 | 1393700.0 | 1396100.0 | 1413200.0 | 1428200.0 | 1442400.0 | 1459000.0 | 1466500.0 | 1462000.0 | 1464300.0 | 1483300.0 | 1514600.0 | 1541900.0 | 1563500.0 | 1591300.0 | 1609900.0 | 1630300.0 | 1639200.0 | 1626700.0 | 1618900.0 | 1635400.0 | 1642000.0 | 1648300.0 | 1671100.0 | 1681400.0 | 1691600.0 | 1706200.0 | 1711100.0 | 1714100.0 | 1733400.0 | 1737700.0 | 1719900.0 | 1678200.0 | 1628400.0 | 1578300.0 | 1528400.0 | 1489200.0 | 1467000.0 | 1449400.0 | 1432100.0 | 1422700.0 | 1402200.0 | 1378600.0 | 1372300.0 | 1393100.0 | 1430000.0 | 1462100.0 | 1476800.0 | 1479000.0 | 1476600.0 | 1461200.0 | 1448300.0 | 1441600.0 | 1444300.0 | 1438600.0 | 1434100.0 | 1439300.0 | 1437700.0 | 1430300.0 | 1426800.0 | 1427800.0 | 1424600.0 | 1432800.0 | 1456500.0 | 1485100.0 | 1500200.0 | 1509600.0 | 1518500.0 | 1530800.0 | 1538000.0 | 1530500.0 | 1524500.0 | 1546500.0 | 1574800.0 | 1599600.0 | 1622500.0 | 1639000.0 | 1656100.0 | 1684600.0 | 1703000.0 | 1710000.0 | 1734300.0 | 1765200.0 | 1786000.0 | 1810700.0 | 1841500 | 1867600 | 1882200 | 1897000 | 1917300 | 1963400 | 1999200 | 2003500 | 2007900 | 2027700 | 2043500 | 2056300 | 2064500 | 2066000 | 2057900 | 2031300 | 1999000 | 1979200 | 1982900 | 2001600 | 2014700 | 2023500 | 2055300 | 2078300 | 2083600 | 2088800 | 2110600 | 2127500 | 2168900 | 2204700 | 2216100.0 | 2212500.0 | 2222600.0 | 2231900.0 | 2250800 | 2285200 | 2329100 | 2354000 | 2355500 | 2352200 | 2332100.0 | 2313300 | 2319600 | 2342100 | 2365900 | 2419700 | 2480400 |
| 3 | 97515 | 94024 | Los Altos | CA | San Jose | Santa Clara | 4335 | 419500.0 | 422300.0 | 430400.0 | 440400.0 | 447100.0 | 447900.0 | 448000.0 | 450100.0 | 452200.0 | 456100.0 | 464600.0 | 470400.0 | 472300.0 | 475400.0 | 480000.0 | 483200.0 | 488900.0 | 496900.0 | 504700.0 | 511300.0 | 519500.0 | 525900.0 | 529600.0 | 535900.0 | 542300.0 | 543300.0 | 543300.0 | 550000.0 | 560000.0 | 571200.0 | 582300.0 | 589800.0 | 589300.0 | 582800.0 | 577600.0 | 578800.0 | 580000.0 | 576600.0 | 576800.0 | 585700.0 | 601700.0 | 624300.0 | 648100.0 | 665900.0 | 681700.0 | 699800.0 | 716400.0 | 734300.0 | 758500.0 | 784600.0 | 808700.0 | 831900.0 | 851100.0 | 865700.0 | 876800.0 | 886200.0 | 900200.0 | 920500.0 | 931300.0 | 929500.0 | 922500.0 | 916300.0 | 897300.0 | 865600.0 | 833500.0 | 811200.0 | 796100.0 | 794200.0 | 799500.0 | 798900.0 | 799900.0 | 813000.0 | 831300.0 | 850200.0 | 874000.0 | 891100.0 | 890900.0 | 883000.0 | 879700.0 | 880100.0 | 880100.0 | 877300.0 | 874100.0 | 873600.0 | 878500.0 | 884500.0 | 887100.0 | 886000.0 | 890700.0 | 901000.0 | 910100.0 | 913600.0 | 918000.0 | 927600.0 | 936400.0 | 935000.0 | 931100.0 | 933300.0 | 935100.0 | 940700.0 | 960100.0 | 987200.0 | 1007000.0 | 1013100.0 | 1013900.0 | 1026100.0 | 1048200.0 | 1062300.0 | 1074600.0 | 1087900.0 | 1109200.0 | 1136800.0 | 1157400.0 | 1152900.0 | 1143300.0 | 1143200.0 | 1147700.0 | 1150700.0 | 1161200.0 | 1179600.0 | 1187100.0 | 1184400.0 | 1176800.0 | 1174800.0 | 1180600.0 | 1188200.0 | 1197100.0 | 1216400.0 | 1235400.0 | 1241500.0 | 1240800.0 | 1245600.0 | 1256800.0 | 1257000.0 | 1247600.0 | 1241500.0 | 1227800.0 | 1207300.0 | 1197400.0 | 1206500.0 | 1212500.0 | 1209900.0 | 1206500.0 | 1202900.0 | 1189300.0 | 1188200.0 | 1192400.0 | 1175900.0 | 1149900.0 | 1141700.0 | 1136900.0 | 1119500.0 | 1099800.0 | 1089100.0 | 1080700.0 | 1071300.0 | 1057500.0 | 1044200.0 | 1039800.0 | 1054700.0 | 1083700.0 | 1112900.0 | 1126800.0 | 1127900.0 | 1147500.0 | 1187000.0 | 1215500.0 | 1222600.0 | 1234500.0 | 1239500.0 | 1234000.0 | 1227200.0 | 1229200.0 | 1230600.0 | 1230000.0 | 1225700.0 | 1212000.0 | 1189100.0 | 1165800.0 | 1149400.0 | 1141500.0 | 1150000.0 | 1173500.0 | 1192400.0 | 1195300.0 | 1199700.0 | 1212100.0 | 1223300.0 | 1223600.0 | 1218000.0 | 1223500.0 | 1242300.0 | 1260700.0 | 1282900.0 | 1314100.0 | 1350100.0 | 1384700.0 | 1415900.0 | 1446800.0 | 1477500.0 | 1503200.0 | 1516700.0 | 1525000.0 | 1542100.0 | 1569100.0 | 1586700.0 | 1593500.0 | 1601000.0 | 1604800 | 1597200 | 1593600 | 1609400 | 1635100 | 1653900 | 1654600 | 1648100 | 1660500 | 1684200 | 1711700 | 1743000 | 1766800 | 1788900 | 1831600 | 1878600 | 1906300 | 1927000 | 1968300 | 2039400 | 2105400 | 2146900 | 2168900 | 2181400 | 2194600 | 2216100 | 2234900 | 2246700 | 2266900 | 2311300 | 2368200.0 | 2401200.0 | 2396100.0 | 2373800.0 | 2365100 | 2357300 | 2348900 | 2330300 | 2302700 | 2269300 | 2263600.0 | 2279300 | 2302700 | 2321700 | 2351200 | 2371500 | 2378100 |
| 4 | 97519 | 94028 | Portola Valley | CA | San Francisco | San Mateo | 7158 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 656200.0 | 666600.0 | 684300.0 | 690200.0 | 690000.0 | 679000.0 | 677500.0 | 688800.0 | 695600.0 | 685000.0 | 682000.0 | 689100.0 | 696500.0 | 701800.0 | 712800.0 | 715500.0 | 717600.0 | 737400.0 | 774600.0 | 803200.0 | 818900.0 | 830100.0 | 845900.0 | 865000.0 | 879600.0 | 892900.0 | 907400.0 | 943900.0 | 984200.0 | 1006400.0 | 1020200.0 | 1069100.0 | 1124300.0 | 1175800.0 | 1213300.0 | 1239600.0 | 1259700.0 | 1283300.0 | 1283900.0 | 1276400.0 | 1294200.0 | 1301300.0 | 1268400.0 | 1230500.0 | 1211900.0 | 1205600.0 | 1192000.0 | 1167200.0 | 1160500.0 | 1179600.0 | 1197500.0 | 1200100.0 | 1198800.0 | 1212100.0 | 1228200.0 | 1227500.0 | 1225700.0 | 1225800.0 | 1227200.0 | 1244100.0 | 1263600.0 | 1256000.0 | 1244500.0 | 1235400.0 | 1229000.0 | 1238400.0 | 1259900.0 | 1263400.0 | 1279800.0 | 1316500.0 | 1352400.0 | 1387300.0 | 1415100.0 | 1433200.0 | 1466300.0 | 1506100.0 | 1497400.0 | 1450100.0 | 1412900.0 | 1396400.0 | 1375900.0 | 1348800.0 | 1337100.0 | 1328500.0 | 1319300.0 | 1312500.0 | 1309800.0 | 1314200.0 | 1335500.0 | 1366900.0 | 1402100.0 | 1431400.0 | 1454800.0 | 1483600.0 | 1513500.0 | 1537200.0 | 1575200.0 | 1615200.0 | 1624200.0 | 1615300.0 | 1628900.0 | 1651300.0 | 1672600.0 | 1692800.0 | 1688900.0 | 1658700.0 | 1633400.0 | 1625900.0 | 1611900.0 | 1608200.0 | 1612100.0 | 1606200.0 | 1576700.0 | 1545300.0 | 1516800.0 | 1500600.0 | 1478600.0 | 1457600.0 | 1440300.0 | 1420400.0 | 1402500.0 | 1398300.0 | 1405600.0 | 1421800.0 | 1434400.0 | 1436100.0 | 1433000.0 | 1426700.0 | 1424300.0 | 1430200.0 | 1416500.0 | 1373700.0 | 1326800.0 | 1289800.0 | 1263600.0 | 1250600.0 | 1250300.0 | 1258400.0 | 1267400.0 | 1273800.0 | 1285000.0 | 1295100.0 | 1316800.0 | 1337700.0 | 1341400.0 | 1338200.0 | 1337600.0 | 1323600.0 | 1316300.0 | 1328100.0 | 1344400.0 | 1355900.0 | 1357100.0 | 1346500.0 | 1337300.0 | 1344500.0 | 1354600.0 | 1343400.0 | 1324300.0 | 1326500.0 | 1331400.0 | 1323300.0 | 1316200.0 | 1316700.0 | 1324000.0 | 1337600.0 | 1354000.0 | 1364000.0 | 1367900.0 | 1377100.0 | 1400100.0 | 1425600.0 | 1456100.0 | 1479700.0 | 1480400.0 | 1470700.0 | 1480200.0 | 1504100.0 | 1513100.0 | 1526200.0 | 1577300.0 | 1644900.0 | 1682400.0 | 1692400.0 | 1698600.0 | 1718800.0 | 1726100 | 1721200 | 1724300 | 1743800 | 1758500 | 1772300 | 1779700 | 1793800 | 1817500 | 1828100 | 1818900 | 1809100 | 1824900 | 1864200 | 1913900 | 1947900 | 1971100 | 1992700 | 2018200 | 2033500 | 2044100 | 2069100 | 2108600 | 2131000 | 2143000 | 2164400 | 2178700 | 2183900 | 2203200 | 2239000 | 2280600.0 | 2318900.0 | 2333600.0 | 2322300.0 | 2309700 | 2319500 | 2344800 | 2350800 | 2336500 | 2317200 | 2313800.0 | 2324200 | 2317300 | 2294500 | 2296300 | 2329400 | 2358300 |
zillow.columns
Index(['RegionID', 'RegionName', 'City', 'State', 'Metro', 'CountyName',
'SizeRank', '1996-04', '1996-05', '1996-06',
...
'2016-09', '2016-10', '2016-11', '2016-12', '2017-01', '2017-02',
'2017-03', '2017-04', '2017-05', '2017-06'],
dtype='object', length=262)
zillow['City'].unique()
array(['New York', 'Los Altos', 'Portola Valley', ..., 'Dunkirk',
'Harper Woods', 'Inkster'], dtype=object)
zillow_ny = zillow[zillow['City'] == 'New York']
zillow_ny = zillow_ny.rename(columns={'RegionName':'zipcode'})
zillow_ny.head()
| RegionID | zipcode | City | State | Metro | CountyName | SizeRank | 1996-04 | 1996-05 | 1996-06 | 1996-07 | 1996-08 | 1996-09 | 1996-10 | 1996-11 | 1996-12 | 1997-01 | 1997-02 | 1997-03 | 1997-04 | 1997-05 | 1997-06 | 1997-07 | 1997-08 | 1997-09 | 1997-10 | 1997-11 | 1997-12 | 1998-01 | 1998-02 | 1998-03 | 1998-04 | 1998-05 | 1998-06 | 1998-07 | 1998-08 | 1998-09 | 1998-10 | 1998-11 | 1998-12 | 1999-01 | 1999-02 | 1999-03 | 1999-04 | 1999-05 | 1999-06 | 1999-07 | 1999-08 | 1999-09 | 1999-10 | 1999-11 | 1999-12 | 2000-01 | 2000-02 | 2000-03 | 2000-04 | 2000-05 | 2000-06 | 2000-07 | 2000-08 | 2000-09 | 2000-10 | 2000-11 | 2000-12 | 2001-01 | 2001-02 | 2001-03 | 2001-04 | 2001-05 | 2001-06 | 2001-07 | 2001-08 | 2001-09 | 2001-10 | 2001-11 | 2001-12 | 2002-01 | 2002-02 | 2002-03 | 2002-04 | 2002-05 | 2002-06 | 2002-07 | 2002-08 | 2002-09 | 2002-10 | 2002-11 | 2002-12 | 2003-01 | 2003-02 | 2003-03 | 2003-04 | 2003-05 | 2003-06 | 2003-07 | 2003-08 | 2003-09 | 2003-10 | 2003-11 | 2003-12 | 2004-01 | 2004-02 | 2004-03 | 2004-04 | 2004-05 | 2004-06 | 2004-07 | 2004-08 | 2004-09 | 2004-10 | 2004-11 | 2004-12 | 2005-01 | 2005-02 | 2005-03 | 2005-04 | 2005-05 | 2005-06 | 2005-07 | 2005-08 | 2005-09 | 2005-10 | 2005-11 | 2005-12 | 2006-01 | 2006-02 | 2006-03 | 2006-04 | 2006-05 | 2006-06 | 2006-07 | 2006-08 | 2006-09 | 2006-10 | 2006-11 | 2006-12 | 2007-01 | 2007-02 | 2007-03 | 2007-04 | 2007-05 | 2007-06 | 2007-07 | 2007-08 | 2007-09 | 2007-10 | 2007-11 | 2007-12 | 2008-01 | 2008-02 | 2008-03 | 2008-04 | 2008-05 | 2008-06 | 2008-07 | 2008-08 | 2008-09 | 2008-10 | 2008-11 | 2008-12 | 2009-01 | 2009-02 | 2009-03 | 2009-04 | 2009-05 | 2009-06 | 2009-07 | 2009-08 | 2009-09 | 2009-10 | 2009-11 | 2009-12 | 2010-01 | 2010-02 | 2010-03 | 2010-04 | 2010-05 | 2010-06 | 2010-07 | 2010-08 | 2010-09 | 2010-10 | 2010-11 | 2010-12 | 2011-01 | 2011-02 | 2011-03 | 2011-04 | 2011-05 | 2011-06 | 2011-07 | 2011-08 | 2011-09 | 2011-10 | 2011-11 | 2011-12 | 2012-01 | 2012-02 | 2012-03 | 2012-04 | 2012-05 | 2012-06 | 2012-07 | 2012-08 | 2012-09 | 2012-10 | 2012-11 | 2012-12 | 2013-01 | 2013-02 | 2013-03 | 2013-04 | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 | 2013-11 | 2013-12 | 2014-01 | 2014-02 | 2014-03 | 2014-04 | 2014-05 | 2014-06 | 2014-07 | 2014-08 | 2014-09 | 2014-10 | 2014-11 | 2014-12 | 2015-01 | 2015-02 | 2015-03 | 2015-04 | 2015-05 | 2015-06 | 2015-07 | 2015-08 | 2015-09 | 2015-10 | 2015-11 | 2015-12 | 2016-01 | 2016-02 | 2016-03 | 2016-04 | 2016-05 | 2016-06 | 2016-07 | 2016-08 | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 61627 | 10013 | New York | NY | New York | New York | 1744 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1428000.0 | 1444100.0 | 1458700.0 | 1502700.0 | 1570100.0 | 1638000.0 | 1687000.0 | 1718300.0 | 1747600.0 | 1753300.0 | 1706600.0 | 1660500.0 | 1650500.0 | 1642200.0 | 1625600.0 | 1618600.0 | 1618900.0 | 1622200.0 | 1623500.0 | 1639500.0 | 1678900.0 | 1705500.0 | 1702200.0 | 1701700.0 | 1730800.0 | 1773400.0 | 1839600.0 | 1903600.0 | 1935000.0 | 1946100.0 | 1974100.0 | 1987800.0 | 1993400.0 | 1990200.0 | 1970600.0 | 1953100.0 | 1979000.0 | 2032800.0 | 2070000.0 | 2090200.0 | 2121100.0 | 2167300.0 | 2200000.0 | 2226600.0 | 2259300.0 | 2285700.0 | 2289000.0 | 2288500.0 | 2285800.0 | 2281300.0 | 2278800.0 | 2295100.0 | 2324500.0 | 2328500.0 | 2333000.0 | 2366700.0 | 2376200.0 | 2377400.0 | 2405100.0 | 2433000.0 | 2406100.0 | 2350800.0 | 2289500.0 | 2257800.0 | 2243500.0 | 2241000.0 | 2233600.0 | 2219900.0 | 2198000.0 | 2209300.0 | 2268700.0 | 2294100.0 | 2273600.0 | 2261700.0 | 2282800.0 | 2288000.0 | 2275700.0 | 2281300.0 | 2302800.0 | 2290900.0 | 2267800.0 | 2253600.0 | 2245300.0 | 2264200.0 | 2301000.0 | 2328600.0 | 2337000.0 | 2341500.0 | 2335200.0 | 2316700.0 | 2299900.0 | 2318700.0 | 2396400.0 | 2479200.0 | 2490300.0 | 2466000.0 | 2492400.0 | 2542800.0 | 2585500.0 | 2618100.0 | 2651700.0 | 2670800.0 | 2670000.0 | 2665400.0 | 2677400.0 | 2687000.0 | 2686200.0 | 2687700.0 | 2694200.0 | 2708600.0 | 2722800.0 | 2727900.0 | 2751300.0 | 2794600 | 2845700 | 2899700 | 2928600 | 2914100 | 2915200 | 2945600 | 2973800 | 2988900 | 3018400 | 3063600 | 3075900 | 3063800 | 3077400 | 3108800 | 3123300 | 3106400 | 3067600 | 3047400 | 3054500 | 3071700 | 3078300 | 3073100 | 3089400 | 3121700 | 3134200 | 3152400 | 3173600 | 3195000 | 3234600 | 3302400.0 | 3335800.0 | 3324200.0 | 3285100.0 | 3274100 | 3298600 | 3341100 | 3347100 | 3305500 | 3261100 | 3244000.0 | 3231400 | 3183300 | 3160200 | 3193500 | 3262200 | 3316500 |
| 1 | 61628 | 10014 | New York | NY | New York | New York | 379 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1331700.0 | 1332100.0 | 1329200.0 | 1331500.0 | 1334300.0 | 1314100.0 | 1292300.0 | 1264600.0 | 1239900.0 | 1241400.0 | 1273900.0 | 1308200.0 | 1349300.0 | 1390900.0 | 1439000.0 | 1501900.0 | 1569000.0 | 1592600.0 | 1578800.0 | 1583800.0 | 1612400.0 | 1623900.0 | 1625500.0 | 1632400.0 | 1632000.0 | 1617600.0 | 1601500.0 | 1589100.0 | 1584000.0 | 1579500.0 | 1553800.0 | 1528600.0 | 1514500.0 | 1505800.0 | 1499800.0 | 1517000.0 | 1559000.0 | 1610300.0 | 1659600.0 | 1732200.0 | 1788500.0 | 1783100.0 | 1753200.0 | 1736200.0 | 1729400.0 | 1741500.0 | 1774500.0 | 1808400.0 | 1827500.0 | 1813800.0 | 1785500.0 | 1774800.0 | 1759800.0 | 1718200.0 | 1677300.0 | 1636900.0 | 1588800.0 | 1559300.0 | 1551900.0 | 1532400.0 | 1507000.0 | 1496600.0 | 1502400.0 | 1485700.0 | 1451800.0 | 1454400.0 | 1494500.0 | 1542200.0 | 1587300.0 | 1616000.0 | 1612300.0 | 1605800.0 | 1592500.0 | 1584200.0 | 1587600.0 | 1594300.0 | 1611700.0 | 1631800.0 | 1648800.0 | 1670500.0 | 1674300.0 | 1657100.0 | 1659200.0 | 1661800.0 | 1641600.0 | 1637800.0 | 1646600.0 | 1652100.0 | 1674800.0 | 1714300.0 | 1736800.0 | 1736800.0 | 1741200.0 | 1757700.0 | 1772600.0 | 1779300.0 | 1795000.0 | 1790300.0 | 1761100.0 | 1744400.0 | 1755400.0 | 1761200.0 | 1767000.0 | 1804300.0 | 1889100.0 | 1963800.0 | 2003800.0 | 2041600 | 2098500 | 2143400 | 2175600 | 2203100 | 2219000 | 2222700 | 2225400 | 2215900 | 2161700 | 2079100 | 2018200 | 1991700 | 1986200 | 2027900 | 2081200 | 2093200 | 2082800 | 2107300 | 2142400 | 2167100 | 2189400 | 2220100 | 2250300 | 2275700 | 2301600 | 2314700 | 2322000 | 2334500 | 2360300 | 2384700.0 | 2388200.0 | 2358300.0 | 2345800.0 | 2381700 | 2439700 | 2483000 | 2480800 | 2443200 | 2430100 | 2452900.0 | 2451200 | 2441900 | 2460900 | 2494900 | 2498400 | 2491600 |
| 2 | 61625 | 10011 | New York | NY | New York | New York | 15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1384000.0 | 1325500.0 | 1297400.0 | 1313500.0 | 1332400.0 | 1314500.0 | 1296700.0 | 1295100.0 | 1280000.0 | 1255300.0 | 1250800.0 | 1261200.0 | 1291100.0 | 1333900.0 | 1358500.0 | 1382600.0 | 1438800.0 | 1473100.0 | 1444500.0 | 1407100.0 | 1400300.0 | 1401600.0 | 1400500.0 | 1393700.0 | 1396100.0 | 1413200.0 | 1428200.0 | 1442400.0 | 1459000.0 | 1466500.0 | 1462000.0 | 1464300.0 | 1483300.0 | 1514600.0 | 1541900.0 | 1563500.0 | 1591300.0 | 1609900.0 | 1630300.0 | 1639200.0 | 1626700.0 | 1618900.0 | 1635400.0 | 1642000.0 | 1648300.0 | 1671100.0 | 1681400.0 | 1691600.0 | 1706200.0 | 1711100.0 | 1714100.0 | 1733400.0 | 1737700.0 | 1719900.0 | 1678200.0 | 1628400.0 | 1578300.0 | 1528400.0 | 1489200.0 | 1467000.0 | 1449400.0 | 1432100.0 | 1422700.0 | 1402200.0 | 1378600.0 | 1372300.0 | 1393100.0 | 1430000.0 | 1462100.0 | 1476800.0 | 1479000.0 | 1476600.0 | 1461200.0 | 1448300.0 | 1441600.0 | 1444300.0 | 1438600.0 | 1434100.0 | 1439300.0 | 1437700.0 | 1430300.0 | 1426800.0 | 1427800.0 | 1424600.0 | 1432800.0 | 1456500.0 | 1485100.0 | 1500200.0 | 1509600.0 | 1518500.0 | 1530800.0 | 1538000.0 | 1530500.0 | 1524500.0 | 1546500.0 | 1574800.0 | 1599600.0 | 1622500.0 | 1639000.0 | 1656100.0 | 1684600.0 | 1703000.0 | 1710000.0 | 1734300.0 | 1765200.0 | 1786000.0 | 1810700.0 | 1841500 | 1867600 | 1882200 | 1897000 | 1917300 | 1963400 | 1999200 | 2003500 | 2007900 | 2027700 | 2043500 | 2056300 | 2064500 | 2066000 | 2057900 | 2031300 | 1999000 | 1979200 | 1982900 | 2001600 | 2014700 | 2023500 | 2055300 | 2078300 | 2083600 | 2088800 | 2110600 | 2127500 | 2168900 | 2204700 | 2216100.0 | 2212500.0 | 2222600.0 | 2231900.0 | 2250800 | 2285200 | 2329100 | 2354000 | 2355500 | 2352200 | 2332100.0 | 2313300 | 2319600 | 2342100 | 2365900 | 2419700 | 2480400 |
| 6 | 61617 | 10003 | New York | NY | New York | New York | 21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1387000.0 | 1315300.0 | 1265900.0 | 1217100.0 | 1167500.0 | 1125800.0 | 1106100.0 | 1105900.0 | 1119200.0 | 1130100.0 | 1127600.0 | 1129900.0 | 1168000.0 | 1214000.0 | 1236900.0 | 1295200.0 | 1398900.0 | 1431700.0 | 1377700.0 | 1343100.0 | 1351900.0 | 1354000.0 | 1358200.0 | 1374800.0 | 1378500.0 | 1378400.0 | 1418300.0 | 1441000.0 | 1427900.0 | 1401900.0 | 1391800.0 | 1379800.0 | 1371400.0 | 1367700.0 | 1400900.0 | 1441100.0 | 1475000.0 | 1491500.0 | 1504700.0 | 1499200.0 | 1481900.0 | 1478800.0 | 1509000.0 | 1532700.0 | 1524300.0 | 1520600.0 | 1533200.0 | 1556600.0 | 1586000.0 | 1595200.0 | 1578900.0 | 1587000.0 | 1627200.0 | 1648100.0 | 1614400.0 | 1553000.0 | 1486000.0 | 1417000.0 | 1375100.0 | 1361300.0 | 1333400.0 | 1299700.0 | 1296500.0 | 1273900.0 | 1227800.0 | 1202600.0 | 1207100.0 | 1220100.0 | 1252400.0 | 1285700.0 | 1278300.0 | 1279100.0 | 1326700.0 | 1376500.0 | 1368900.0 | 1366000.0 | 1381300.0 | 1380700.0 | 1368500.0 | 1372700.0 | 1378000.0 | 1361700.0 | 1357800.0 | 1364400.0 | 1358000.0 | 1329800.0 | 1317800.0 | 1333200.0 | 1348500.0 | 1349500.0 | 1352200.0 | 1354100.0 | 1351900.0 | 1364200.0 | 1376600.0 | 1384200.0 | 1387900.0 | 1404200.0 | 1419200.0 | 1425700.0 | 1435300.0 | 1460300.0 | 1466500.0 | 1458100.0 | 1465500.0 | 1502300.0 | 1563900.0 | 1592000 | 1596200 | 1625200 | 1672300 | 1699500 | 1718500 | 1734300 | 1748600 | 1763700 | 1766700 | 1772200 | 1762700 | 1736700 | 1712400 | 1703700 | 1702500 | 1708800 | 1716300 | 1720500 | 1721800 | 1741800 | 1775800 | 1796500 | 1821500 | 1870100 | 1901000 | 1904900 | 1914000 | 1926400 | 1932200 | 1936700.0 | 1945200.0 | 1935600.0 | 1911200.0 | 1918700 | 1947600 | 1951300 | 1932800 | 1930400 | 1937500 | 1935100.0 | 1915700 | 1916500 | 1965700 | 2045300 | 2109100 | 2147000 |
| 7 | 61637 | 10023 | New York | NY | New York | New York | 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1526800.0 | 1424500.0 | 1346600.0 | 1331300.0 | 1322500.0 | 1289300.0 | 1265400.0 | 1249700.0 | 1241100.0 | 1232700.0 | 1225500.0 | 1228200.0 | 1252600.0 | 1266100.0 | 1288700.0 | 1308100.0 | 1333000.0 | 1356400.0 | 1362000.0 | 1353600.0 | 1364000.0 | 1373900.0 | 1389600.0 | 1401600.0 | 1404100.0 | 1415800.0 | 1432400.0 | 1455400.0 | 1474200.0 | 1462300.0 | 1438300.0 | 1435500.0 | 1427800.0 | 1411200.0 | 1407400.0 | 1419700.0 | 1457400.0 | 1500800.0 | 1524900.0 | 1537800.0 | 1558700.0 | 1586100.0 | 1602300.0 | 1621100.0 | 1639300.0 | 1657400.0 | 1657400.0 | 1656100.0 | 1649400.0 | 1643400.0 | 1632400.0 | 1618200.0 | 1588300.0 | 1543600.0 | 1500800.0 | 1464200.0 | 1426100.0 | 1387300.0 | 1362600.0 | 1351700.0 | 1344300.0 | 1331800.0 | 1334800.0 | 1314200.0 | 1271900.0 | 1252300.0 | 1262300.0 | 1279200.0 | 1309000.0 | 1335300.0 | 1353800.0 | 1366400.0 | 1372100.0 | 1381300.0 | 1385000.0 | 1388100.0 | 1399100.0 | 1399800.0 | 1389300.0 | 1384700.0 | 1380900.0 | 1367900.0 | 1365400.0 | 1375100.0 | 1380400.0 | 1377000.0 | 1375100.0 | 1379000.0 | 1395200.0 | 1414500.0 | 1419000.0 | 1403100.0 | 1383200.0 | 1376700.0 | 1378200.0 | 1378700.0 | 1375900.0 | 1366700.0 | 1365500.0 | 1382200.0 | 1404700.0 | 1428000.0 | 1445700.0 | 1452900.0 | 1460100.0 | 1484400.0 | 1508400.0 | 1522800 | 1538300 | 1568600 | 1597400 | 1622900 | 1654300 | 1684600 | 1713000 | 1728800 | 1736100 | 1745900 | 1753800 | 1736600 | 1730400 | 1734500 | 1728700 | 1720800 | 1717700 | 1700100 | 1680400 | 1676400 | 1685600 | 1708100 | 1730400 | 1751800 | 1778300 | 1810400 | 1831600 | 1844400 | 1861600 | 1889600.0 | 1901500.0 | 1895300.0 | 1890200.0 | 1898400 | 1924500 | 1967300 | 1993500 | 1980700 | 1960900 | 1951300.0 | 1937800 | 1929800 | 1955000 | 2022400 | 2095000 | 2142300 |
# check for all null values
zillow_ny['zipcode'].isnull().sum()
0
zillow_ny['zipcode']=zillow_ny['zipcode'].astype('str')
len(zillow_ny['zipcode'].unique())
25
columns_selected = ['RegionID', 'zipcode', 'City', 'State', 'Metro', 'CountyName',
'SizeRank', '2017-06']
zillow_ny_new = zillow_ny[columns_selected]
zillow_ny_new.reset_index(drop=True, inplace=True)
zillow_ny_new.rename(columns={'2017-06':'cost'}, inplace=True)
zillow_ny_new.head(2)
| RegionID | zipcode | City | State | Metro | CountyName | SizeRank | cost | |
|---|---|---|---|---|---|---|---|---|
| 0 | 61627 | 10013 | New York | NY | New York | New York | 1744 | 3316500 |
| 1 | 61628 | 10014 | New York | NY | New York | New York | 379 | 2491600 |
merged_data = pd.merge(air_bnb_ny_norm_new,zillow_ny_new, on = 'zipcode')
merged_data.head(2)
| state | host_id | host_since | host_is_superhost | host_total_listings_count | neighbourhood_cleansed | neighbourhood_group_cleansed | room_type | bedrooms | bathrooms | square_feet | minimum_nights | maximum_nights | number_of_reviews | last_review | review_scores_rating | security_deposit | cleaning_fee | city | latitude | longitude | zipcode | price | weekly_price | monthly_price | availability_30 | availability_60 | availability_90 | availability_365 | cancellation_policy | RegionID | City | State | Metro | CountyName | SizeRank | cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NY | 103222412.0 | 11/10/16 | f | 1.0 | Arrochar | Staten Island | Entire home/apt | 2.0 | 1.0 | NaN | 7.0 | 8.0 | 0.0 | NaN | NaN | NaN | 100.0 | Staten Island | 40.595265 | -74.064715 | 10305 | 120.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | flexible | 61781 | New York | NY | New York | Richmond | 2087 | 425100 |
| 1 | NY | 41398005.0 | 8/13/15 | f | 3.0 | Arrochar | Staten Island | Entire home/apt | 2.0 | 1.0 | NaN | 2.0 | 7.0 | 0.0 | NaN | NaN | $100.00 | 70.0 | Staten Island | 40.593469 | -74.069137 | 10305 | 110.0 | NaN | NaN | 1.0 | 16.0 | 35.0 | 197.0 | flexible | 61781 | New York | NY | New York | Richmond | 2087 | 425100 |
merged_data.shape
(1075, 37)
columns = ['zipcode', 'neighbourhood_group_cleansed', 'room_type', 'bedrooms', 'latitude', 'last_review',
'longitude', 'city', 'price','cleaning_fee', 'square_feet', 'minimum_nights',
'maximum_nights','number_of_reviews','review_scores_rating', 'availability_365', 'cost']
data_final = merged_data[columns]
data_final.head(3)
| zipcode | neighbourhood_group_cleansed | room_type | bedrooms | latitude | last_review | longitude | city | price | cleaning_fee | square_feet | minimum_nights | maximum_nights | number_of_reviews | review_scores_rating | availability_365 | cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.595265 | NaN | -74.064715 | Staten Island | 120.0 | 100.0 | NaN | 7.0 | 8.0 | 0.0 | NaN | 0.0 | 425100 |
| 1 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.593469 | NaN | -74.069137 | Staten Island | 110.0 | 70.0 | NaN | 2.0 | 7.0 | 0.0 | NaN | 197.0 | 425100 |
| 2 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.596528 | NaN | -74.063712 | Staten Island | 150.0 | 100.0 | NaN | 2.0 | 1125.0 | 0.0 | NaN | 0.0 | 425100 |
len(data_final[data_final['availability_365']==0])
346
0 days availability can possibly refer to the listing not being available at the moment, so we can keep this data
Assumptions for Revenue and Profits
Revenue = [Price_Per_day] No.of_days_rented occupancy_rate + [Cleaning fee] 0.40 365
data_final['annual_income'] = (data_final['price']) * (0.75 * 365) + data_final['cleaning_fee'] * (0.40 * 365)
data_final['annual_income'] = data_final['annual_income'].round(2)
data_final.head(2)
| zipcode | neighbourhood_group_cleansed | room_type | bedrooms | latitude | last_review | longitude | city | price | cleaning_fee | square_feet | minimum_nights | maximum_nights | number_of_reviews | review_scores_rating | availability_365 | cost | annual_income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.595265 | NaN | -74.064715 | Staten Island | 120.0 | 100.0 | NaN | 7.0 | 8.0 | 0.0 | NaN | 0.0 | 425100 | 47450.0 |
| 1 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.593469 | NaN | -74.069137 | Staten Island | 110.0 | 70.0 | NaN | 2.0 | 7.0 | 0.0 | NaN | 197.0 | 425100 | 40332.5 |
Calculating how many years does it take to recover our money. So, the time taken for recovering the investment will be the ratio of investment to Income per year. This is called break-even.
Time To Recover Investment = (Investment per listing) / (Annual Income per listing)
Profitability Index:
ProfitabilityIndex(PI) = (Presentvalueoffuturecashflows)/(IntialInvestment)
Profitability Index represents the relationship between the costs and benefits of a proposed project. It is the ratio of present value of future cash flows and intial investment.Here , since time value of money discount rate is 0 . Assuming the properties sell at the same price after 10 years of being in the rental business.
Presentvalueoffuturecashflows = (10 (Annual Income per listing) + (Investment per listing))
Ten Year PI = (10 (Annual Income per listing) + (Investment per listing))/(Investment per listing)
data_final['Invest_Rec']= data_final['cost'] / data_final['annual_income']
data_final['Invest_Rec']= data_final['Invest_Rec'].astype('float')
data_final['ten_year_PI'] = ((10*data_final['annual_income']) + data_final['cost'])/data_final['cost']
data_final.head(3)
| zipcode | neighbourhood_group_cleansed | room_type | bedrooms | latitude | last_review | longitude | city | price | cleaning_fee | square_feet | minimum_nights | maximum_nights | number_of_reviews | review_scores_rating | availability_365 | cost | annual_income | Invest_Rec | ten_year_PI | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.595265 | NaN | -74.064715 | Staten Island | 120.0 | 100.0 | NaN | 7.0 | 8.0 | 0.0 | NaN | 0.0 | 425100 | 47450.0 | 8.958904 | 2.116208 |
| 1 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.593469 | NaN | -74.069137 | Staten Island | 110.0 | 70.0 | NaN | 2.0 | 7.0 | 0.0 | NaN | 197.0 | 425100 | 40332.5 | 10.539887 | 1.948777 |
| 2 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.596528 | NaN | -74.063712 | Staten Island | 150.0 | 100.0 | NaN | 2.0 | 1125.0 | 0.0 | NaN | 0.0 | 425100 | 55662.5 | 7.637099 | 2.309398 |
For final "data_final" Dataset:
annual_income: This column gives a profit estimate that a host can get by renting the property in one year with 75% occupancy rate.
Invest_Rec: This column gives an estimate no.of years host needs to recover the investment for that listing.
ten_year_PI: This column gives the best zipcode that could be recovered back in less than 10 years.
labels = data_final.neighbourhood_group_cleansed.value_counts().index
sizes = data_final.neighbourhood_group_cleansed.value_counts().values
explode = (0.1, 0.2, 0.3, 0.4)
fig, ax = plt.subplots()
wedges, texts, autotexts = ax.pie(sizes, explode=explode, labels=labels, autopct='%2.1f%%',
shadow=True, startangle=130)
# Equal aspect ratio ensures that pie is drawn as a circle.ax.axis('equal')
ax.set(title="More Rented Neighbourhood Group Piechart")
ax.legend(wedges, labels,
title="Neighbourhood Groups",
loc="center left",
bbox_to_anchor=(1, 0,0.5, 1))
plt.setp(autotexts, size=8, weight="bold")
plt.show()
Observations:
From above graph we can observe that the most rented neighbourhood group is Manhattan i.e 65% , Brooklyn i.e 32.9% and Queens i.e 1.3%. As we can see given the last results that as much surface has the neighbourhood group doesn't mean will have the most rented zone, moreover it's the smallest one !
# Most popular neighbourhood areas and comparision with the room type in that area
fig, ax = plt.subplots(figsize=(10, 8))
ax = sns.countplot(x = 'neighbourhood_group_cleansed', data =data_final, hue='room_type',palette="Set2")
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="right")
ax.set_title("Room Type Distribution in Neighbourhoods",fontsize= 20) # title of plot
ax.set_xlabel('Neighbourhood',fontsize = 15) #xlabel
ax.set_ylabel('Counts', fontsize = 15)#ylabel
plt.show()
#Bar graph for neighbourhood vs Avg. Price
neighbourhood = data_final.groupby('neighbourhood_group_cleansed').price.mean().round(2).sort_values(ascending=False).head()
neighbourhood=neighbourhood.reset_index()
neighbourhood=neighbourhood.rename(columns={'price':'Avg_Price'})
fig, axs = plt.subplots(1,1,figsize=(10,6))
ax = sns.barplot(x="neighbourhood_group_cleansed", y="Avg_Price", data=neighbourhood, hue='neighbourhood_group_cleansed')
ax.set_title('Neighbourhood with highest Average Price', fontsize= 20)
ax.set_xlabel('Neighbourhood',fontsize = 15) #xlabel
ax.set_ylabel('Average Price', fontsize = 15)#ylabel
plt.xticks(rotation=90)
(array([0, 1, 2, 3]), [Text(0, 0, 'Manhattan'), Text(1, 0, 'Brooklyn'), Text(2, 0, 'Staten Island'), Text(3, 0, 'Queens')])
fig, ax = plt.subplots(figsize=(8.7, 8.27))
ax = sns.countplot(x = 'neighbourhood_group_cleansed', data =data_final, hue='neighbourhood_group_cleansed',palette="Set2")
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="right")
ax.set_title("Property Distribution in Neighbourhoods",fontsize= 20) # title of plot
ax.set_xlabel('Neighbourhood',fontsize = 15) #xlabel
ax.set_ylabel('Counts', fontsize = 15)#ylabel
plt.show()
Observations:
#Bar-graph for the room_type vs Avg_price
room_type_price=data_final.groupby('room_type').price.mean()
room_type_price=room_type_price.reset_index()
room_type_price=room_type_price.rename(columns={'price':'Avg_Price'})
fig, axs = plt.subplots(1,1,figsize=(7,5))
ax = sns.barplot(x="room_type", y="Avg_Price", data=room_type_price, hue='room_type')
ax.set_title('Comparison of Room_Type Vs Average Price', fontsize= 15)
ax.set_xlabel('Room Type',fontsize = 13) #xlabel
ax.set_ylabel('Average Price', fontsize = 13)#ylabel
plt.xticks(rotation=90)
(array([0, 1]), [Text(0, 0, 'Entire home/apt'), Text(1, 0, 'Private room')])
#Bar-graph for the room_type vs Avg. time to reover the investment.
room_type_Rec=data_final.groupby('room_type').Invest_Rec.mean()
room_type_Rec=room_type_Rec.reset_index()
room_type_Rec=room_type_Rec.rename(columns={'Invest_Rec':'Avg_Invest_Rec'})
fig, axs = plt.subplots(1,1,figsize=(7,5))
ax = sns.barplot(x="room_type", y="Avg_Invest_Rec", data=room_type_Rec, hue='room_type')
ax.set_title('Recovery time for the type of room')
ax.set_xlabel('Room Type',fontsize = 13) #xlabel
ax.set_ylabel('Average Investment Recovery', fontsize = 13)#ylabel
plt.xticks(rotation=90)
(array([0, 1]), [Text(0, 0, 'Entire home/apt'), Text(1, 0, 'Private room')])
Observations:
Bar-graph for the zipcode vs annual price
data_zip = data_final.copy()
data_zip= data_zip.sort_values(['cost'], ascending=False)
fig, axs = plt.subplots(1,1,figsize=(11,9))
ax = sns.barplot(x="zipcode", y="cost", data=data_zip)
ax.set_title('Price Distribution per zipcodes', fontsize= 20)
ax.set_xlabel('Zipcodes',fontsize = 15) #xlabel
ax.set_ylabel('Price Value Range', fontsize = 15)#ylabel
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21]),
[Text(0, 0, '10013'),
Text(1, 0, '10014'),
Text(2, 0, '10011'),
Text(3, 0, '10003'),
Text(4, 0, '10023'),
Text(5, 0, '10028'),
Text(6, 0, '10022'),
Text(7, 0, '10021'),
Text(8, 0, '10128'),
Text(9, 0, '10036'),
Text(10, 0, '10025'),
Text(11, 0, '11201'),
Text(12, 0, '11217'),
Text(13, 0, '11231'),
Text(14, 0, '11215'),
Text(15, 0, '11234'),
Text(16, 0, '10305'),
Text(17, 0, '10308'),
Text(18, 0, '11434'),
Text(19, 0, '10312'),
Text(20, 0, '10306'),
Text(21, 0, '10304')])
Bar-graph for the zipcode vs annual income
data_inc = data_final.copy()
data_inc= data_inc.sort_values(['annual_income'], ascending=False)
fig, axs = plt.subplots(1,1,figsize=(11,9))
ax = sns.barplot(x="zipcode", y="annual_income", data=data_inc)
ax.set_xlabel('Zipcodes',fontsize = 15) #xlabel
ax.set_ylabel('Annual Income', fontsize = 15)#ylabel
ax.set_title('Zip code Vs Annual Income', fontsize= 20)
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21]),
[Text(0, 0, '10013'),
Text(1, 0, '10011'),
Text(2, 0, '10036'),
Text(3, 0, '10014'),
Text(4, 0, '10028'),
Text(5, 0, '10023'),
Text(6, 0, '10025'),
Text(7, 0, '10022'),
Text(8, 0, '10003'),
Text(9, 0, '11201'),
Text(10, 0, '11217'),
Text(11, 0, '11215'),
Text(12, 0, '10128'),
Text(13, 0, '11231'),
Text(14, 0, '10021'),
Text(15, 0, '10312'),
Text(16, 0, '11434'),
Text(17, 0, '10305'),
Text(18, 0, '11234'),
Text(19, 0, '10306'),
Text(20, 0, '10304'),
Text(21, 0, '10308')])
Bar-graph for the zipcode vs Investment Recovery time
data_rec = data_final.copy()
data_rec= data_rec.sort_values(['Invest_Rec'], ascending=False)
fig, axs = plt.subplots(1,1,figsize=(11,9))
ax = sns.barplot(x="zipcode", y="Invest_Rec", data=data_rec)
ax.set_title('Zip code Vs Investment Recovery time', fontsize= 20)
ax.set_xlabel('Zipcodes',fontsize = 15) #xlabel
ax.set_ylabel('Investment Recovery Time', fontsize = 15)#ylabel
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21]),
[Text(0, 0, '10013'),
Text(1, 0, '10023'),
Text(2, 0, '10036'),
Text(3, 0, '10014'),
Text(4, 0, '10003'),
Text(5, 0, '10021'),
Text(6, 0, '10128'),
Text(7, 0, '10028'),
Text(8, 0, '10025'),
Text(9, 0, '10022'),
Text(10, 0, '11201'),
Text(11, 0, '11215'),
Text(12, 0, '11231'),
Text(13, 0, '10305'),
Text(14, 0, '10011'),
Text(15, 0, '11217'),
Text(16, 0, '11434'),
Text(17, 0, '11234'),
Text(18, 0, '10308'),
Text(19, 0, '10306'),
Text(20, 0, '10304'),
Text(21, 0, '10312')])
Bar-graph for the zipcode vs ten year profitability index
data_pi = data_final.copy()
data_pi= data_pi.sort_values(['ten_year_PI'], ascending=False)
fig, axs = plt.subplots(1,1,figsize=(11,9))
ax = sns.barplot(x="zipcode", y="ten_year_PI", data=data_pi)
ax.set_title('Zip code Vs ten year profitability index', fontsize= 20)
ax.set_xlabel('Zipcodes',fontsize = 15) #xlabel
ax.set_ylabel('Ten Year Profitability Index', fontsize = 15)#ylabel
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21]),
[Text(0, 0, '10312'),
Text(1, 0, '11434'),
Text(2, 0, '11215'),
Text(3, 0, '10305'),
Text(4, 0, '10025'),
Text(5, 0, '10036'),
Text(6, 0, '11217'),
Text(7, 0, '10306'),
Text(8, 0, '11234'),
Text(9, 0, '11201'),
Text(10, 0, '10304'),
Text(11, 0, '11231'),
Text(12, 0, '10028'),
Text(13, 0, '10022'),
Text(14, 0, '10023'),
Text(15, 0, '10011'),
Text(16, 0, '10128'),
Text(17, 0, '10003'),
Text(18, 0, '10014'),
Text(19, 0, '10013'),
Text(20, 0, '10021'),
Text(21, 0, '10308')])
Let's do a deeper analysis by using a scatter plot. Consider the listings with no.of.reviews > 2 and rename the columns for better understanding
data_visual = data_final[data_final['number_of_reviews'] > 2]
data_visual = data_visual.groupby(['zipcode']).price.count()
data_visual = data_visual.reset_index()
data_visual = data_visual.rename(columns={'price':'properties'})
data_vis = pd.merge(data_final,data_visual, on='zipcode', how='left')
room_type_price=room_type_price.rename(columns={'price':'properties'})
data_vis.rename(columns = {'neighbourhood_group_cleansed':'area'}, inplace=True)
data_vis['cost'] = data_vis['cost'].fillna(0)
data_vis['cost'] = data_vis['cost'].astype(int)
data_vis.head(2)
| zipcode | area | room_type | bedrooms | latitude | last_review | longitude | city | price | cleaning_fee | square_feet | minimum_nights | maximum_nights | number_of_reviews | review_scores_rating | availability_365 | cost | annual_income | Invest_Rec | ten_year_PI | properties | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.595265 | NaN | -74.064715 | Staten Island | 120.0 | 100.0 | NaN | 7.0 | 8.0 | 0.0 | NaN | 0.0 | 425100 | 47450.0 | 8.958904 | 2.116208 | 4.0 |
| 1 | 10305 | Staten Island | Entire home/apt | 2.0 | 40.593469 | NaN | -74.069137 | Staten Island | 110.0 | 70.0 | NaN | 2.0 | 7.0 | 0.0 | NaN | 197.0 | 425100 | 40332.5 | 10.539887 | 1.948777 | 4.0 |
Return of Investment(ROI) quadrant
plotly.offline.init_notebook_mode()
fig = px.scatter(data_vis, x="annual_income", y="Invest_Rec", color="properties", size="cost",
hover_data=['cost','area','zipcode'],
title='ROI Quadrant : Bottom-Right - High Revenue & Quick BreakEven')
fig.show()
Observations:
Best popular market with higher number of properties to invest in is Manhattan 10013 property with 3.3 million and earn $225K as rent per year. Break Even point is 14.5 years.
Other best markets to invest in with quick break-evens are in Staten Island and Queens with below zipcodes:
As we have past few years of price data is available, we can do time series analysis to better predict the current median price and use this for profitability analysis.
Though we had many columns available, due to large amount of missing data we couldn't use them for our analysis, we could use clustering techniques for imputation of missing values.
If weekly and monthly prices had a lower missing percentage, we could have imputed by calculating using the below formula.
weekly_price = (price7)
monthly_price = (weekly_price 4.3)
Note: Considering average week for a month as (52 weeks in a year / 12 months a year) = 4.3
May use NLP sentiment Analysis on summary and find the strength and weaknesses of each zip code.
Improving the marketing over 3 bedrooms houses with deeper analysis
Since price for zillow data is available only for 25 zipcodes of New York city, more data can be produced for zipcodes where data is unavailable.
Tableau can be used for better interactive dashbords.
https://www.investopedia.com/articles/basics/10/guide-to-calculating-roi.asp
https://www.investopedia.com/terms/p/profitability.asp
https://towardsdatascience.com/exploratory-data-analysis-eda-a-practical-guide-and-template-for-structured-data-abfbf3ee3bd9
https://stackoverflow.com/questions/66686072/size-legend-for-plotly-express-scatterplot-in-python
https://developers.google.com/maps/documentation/geocoding/start